Non-native /3-sheet formation: insights into protein amyloidosis 

Chinlin Guo^^, Herbert Levine-'-, and David A. Kessler* 
^Department of Molecular Cell Biology, 
Harvard University, 16 Divinity Avenue, 
Room 3007, Cambridge, MA 02138 
^Department of Physics, University of California, 
San Diego 9500 Cilman Drive, La Jolla, CA 92093-0319 and 
* Department of Physics, Bar-Ilan University, Ramat-Gan, Israel 

Abstract 

Protein amyloidosis is a cytopathological process characterized by the formation of highly (5- 
sheet-rich fibrils. How this process occurs and how to prevent /treat the associated diseases are 
not completely understood. Here, we carry out a theoretical investigation of sequence-independent 
/3-sheet formation, based on recent findings regarding the cooperativity of hydrogen-bond network 
formation. Our results strongly suggest that in vivo /3-sheet aggregation is induced by inter-sheet 
stacking dynamics. This leads to a prediction for the minimal length of susceptible polymer needed 
to form such an aggregate. Remarkably, the prediction corresponds quite well with the critical 
lengths detected in poly-glutamine-related diseases. Our work therefore provides a theoretical 
framework capable of understanding the underlying mechanism and shedding light on therapy 
strategies of protein amyloidosis. 
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Protein amyloidosis |jT|] is seen in the prion-related transmissible spongiform encephalopathies 
such as Creutzfeldt- Jakob disease and nine fatal poly-glutamine (poly-Q) related neu- 
rodegenerative disorders including Huntington disease, Kennedy disease, six spinocerebellar 
ataxias, and dentatorubral pallidoluysian atrophy It has also been found that normal 
peptides can undergo misfolding to form amyloid aggregate [Q. Thus, protein amyloidosis 
can generally involve a sequence-independent mechanism; the critical questions then would 
be how this process is initiated and how to prevent its onset. 

Although many different proteins can form amyloidosis [Q, it has been shown that the 
agents in aggregation-prone peptides are their N-terminal oligopeptide repeats, which have 
a tendency to form non-native /5-sheets that in turn can serve as templates to aggregate 
additional mis/un-folded peptides into elongated fibrils 0, The rate-limiting step in this 
process is the nucleation of the non-native /3-sheets 0; moreover, the aggregation tendency 
correlates well with the length of repeats Al 0, |5|. For instance, in the Huntington disease 
gene, the "normal" range of poly-Q repeats is 6 to 34 CAGs, and the pathological forms 
involve tracts of 37 or longer repeats 0. Likewise, a deletion of prion N-terminal repeats 
can suppress its spontaneous aggregation rate [Q. 

What is the biological implication of these facts? In principle, a longer Al implies a larger 
non-native /3-sheet formation. Under normal physiological conditions, the nucleation of 
non-native /3-sheets is suppressed by intracellular scavengers, chaperones and the ubiquitin- 
proteosome system which detect and convert /degrade misfolded peptide intermediates [0, 
It has however been found that when Al exceeds the pathological threshold, scavengers do 
not work; instead, they become non-functionally engaged with the non-native structure 
0. This impairment of molecular scavengers indicates that the formation of large non- 
native /3-sheets can undergo a rapid "two-state-like" transition, i.e., proceeding without 
any intermediate of long enough detectable lifetime. It also implies that the free energy 
barrier AG separating the "two-states" is too high to be overcome by intracellular machinery 
utilizing regular energy sources (ATP, e.g.). Thus, elucidating how the repeat length Al is 
related to these two properties is the main goal of present work. 

Methods 

We study the sequence-independent thermodynamics and kinetics of an extensive ho- 
mopolymer (to mimic the oligopeptide repeats) that is either at the tail or an inserted loop of 
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a pathological peptide, Fig. ||. We are interested in how this homopolymer can form ordered 
/3-sheets that provide the most advantageous fibril template (of course, the homopolymer can 
form single-strand alpha-helix and multi-strand amorphous aggregate, which nevertheless is 
less pathogenic and is not our major concern). In particular, we identify the homopolymer 
length Al and corresponding /3-sheet topology at which the two-state-like behavior emerges 
with a large free energy barrier AG. Furthermore, we investigate the predominant kinetics 
and patterns (Fig. P characterizing the transition; this knowledge can then be applied to 
therapy issues such as drug design targets and expected dose-response curves. 

Our study proceeds in two stages. First, we study the formation of a single-layer /3-sheet 
(from one isolated pathological peptide) consisting of M-l-1 strands each of length L residues, 
making use of a model incorporating the cooperativity inherent in H-bond formation. This 



cooperativity has its microscopic origin in the collective expelling of water molecules |10 



from inter-chain positions in the nascent sheet that results in the free energy of for- 
mation of a single H-bond being dependent on the state of the neighboring bonds along 
the hairpin axis (intra-hairpin coupling) and between different hairpin segments along the 
"fibril" elongation axis (inter-hairpin coupling) |]lT, 12]. The thermodynamics is then cal- 
culated by Monte Carlo enumeration of the partition function [|I^ . Furthermore, we utilize 
the idea of droplet formation [^, to analyze the dominant kinetic pathway for the 



folding transition. Afterwards, we incorporated sheet-sheet interactions ("stacking") and 
made a simple assumption that the free energy change due to a single H-bond formation 
within one peptide has a linear dependence on the local H-bond density contributed by other 
nearby sheets. Using a mean-field approach, we then can compute the H-bond density self- 
consistently. The precise form of our Hamiltonian as well as the details of our calculations 
are available in the appendix and supplement. 

Results and Discussion 

Two-state single-layer /3-sheet formation. Results for this case are shown in Fig. 
^. First, there is indeed a region of two-state thermodynamics, bounded by two regions 
inside which the formation of non-native /3-sheets always encounters intermediate states (and 
hence can be eliminated by intracellular scavengers). Fig. This implies, in accord with 
experimental observation [jl6[, that too long peptide N-terminal repeats do not aggregate 



in vivo. Second, the kinetic barrier AG for two-state transition is large (> 25 kcal/mol for 
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number of H-bond > 10^, Fig. e.g.), meaning that /5-sheet formation in two-state region 
cannot be easily reversed by intracellular scavengers. Inside the two-state region, we also 
observe two distinct patterns that dominate the kinetics, as a function of the topology. 

Mechanism facilitating in vivo amyloidosis. The large AG also implies that the transi- 
tion would not occur in any reasonable time span. Thus, there must be another mechanism to 
enhance non-native /3-sheet formation. The most likely one is the inter-sheet stacking, as we 
note that folded /3-sheets tend to stack into a nearly anhydrous structure even when rich with 
polar side-chains This is probably because a) hydration of a large-scale 2-dimensional 
structure requires extensive, homogeneously distributed H-bond donors and acceptors to 
interact with water molecules, and b) the formation of extensive /3-sheet H-bond network 
consumes all available backbone electron pairs and protons (including the CqH- • -OC in- 



teraction In], |T8|). Eventually, neighboring sheets prefer squeezing out inter-sheet water 
molecules. This leads to a synergistic H-bond formation and sheet-sheet stacking, as a 
function of peptide concentration [Cp\. 

Sheet-sheet stacking facilitate amyloidosis. The results in Fig. |^a show that change 
of [Cp] can reduce AG by an extent of 1/4 ~ 1/2 fold. Also, the stacking itself has a 
kinetic barrier AG^ for a collective onset (i.e., a phase separation between the dilute single- 
layer and dense multi-layer peptide phases). This AGs is different from AG and can be 
reduced to zero if [Gp] is high enough. Fig. |^2- More importantly, in order to have stacking 
effect, we found that a minimal length of peptide N-terminal repeat A/^j^ is required. 
Fig. Pdi. In other words, the kinetics of non-native /3-sheet formations for peptides with 
A/ < A/mm, is qualitatively similar to the single-layer case. But, once A/ > Almin, the 
kinetics can be speeded up by a synergistic inter-sheet stacking. Consequently, non-native 
/3-sheet formations can be more efficient at large Al and high [Gp] conditions. This finding 
is consistent with the positive correlation between aggregation tendency and amount of Al, 
as observed both clinically and experimentally [0, Indeed, the minimal number of poly- 
glutamine repeats in Huntington disease is 30-40 0, in a surprisingly good agreement with 
our estimates and hence strongly suggesting that in vivo aggregation is driven by multi-layer 
/3-sheet stacking. 

Template assembly & Nucleated Conversion. The inter-sheet stacking helps /3-sheet 
formation in high peptide concentrations; once formed, these stacked sheets can nevertheless 
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dissociate if the peptide concentration is down-regulated or fluctuating to a lower level (as a 
common scene at in vivo conditions). But, with a large AG, the dissociated sheets can stay 
at stable single-layer format for a reasonable time span; this would allow them to serve as 
templates to direct other cytoplasmic peptides into non-native /3-sheet format. Fig. |I|. Thus, 
sheet-sheet stacking synergistically enhances non-native /?-sheet formation on both thermo- 
dynamics and kinetics prospectives. This is a combined nucleated conformational conversion 
(NCC) and mono/oligomer-directed conversion (M/O-DC) (or template assembly, TA) pro- 
cess [|, 0], and might reconcile the long-standing debate on the self-perpetuating mechanism 
in amyloid seed formation 

The effect of sequestering agent. Finally, we consider the implications of our work for 
the dose-response curves of therapeutic agents. We consider the simplest case that a scav- 
enger can sequester unfolded mutant peptides before they pass the transition state. In this 
case, binding of the sequestering agent is mutually exclusive with /3-sheet H-bond network 
formation; this is presumably how scavengers such as chaperones or polyamine 0, |20 



function. A successful agent, then, should enhance the bottleneck (i.e., minimal mutant 
peptide concentration [Cp]mm and minimal length of repeats Almm) for the onset of stack- 
ing. Using the mean-fleld approach, we found however that at high sequester concentration, 
Almin is reduced (Fig. |^c) because of the modulation of stacking cooperativity (whereas 
[Cplmin is increased); this suggests that amyloid nucleation is least prohibited at median 
agent concentration. Interestingly, this result seems to correspond with the evidence that 
huntingtin aggregation is eliminated only via the overexpression or deletion of chaperones 
2g. 



Summary 

We have presented a modeling approach to the initial seed formation in protein amyloi- 
dosis. Our work indicates that stacking is the critical effect and that simple agents that try 
to interfere with H-bond network formation may not be very effective at preventing aggrega- 
tion; alternatively, the kinetics analysis (Fig. Qd) suggests that a potentially more effective 
strategy would be to attempt to interfere with stacked non-native /?-turns or with dissociated 
non-native single-layer /3-sheets. Future work will take into account competition between /3- 
sheets and non-trivial native structures, hence allowing for application of our ideas to a wide 
variety of aggregation-prone systems. Finally, it is interesting to speculate on the fact that 
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even given the obvious disadvantage of an aggregation tendency, natural selection has not 
replaced these oligopeptide-repeat containing proteins by other sequence designs. Perhaps 
they can serve as an essential building block for bio-architectural construction, [jTT], [1^, or 
an evolutionary tool which aids in the addition of new sequence to an existing peptide ^1 



or a buffer to titrate chaperones and hence to expose signaling molecules that have genetic, 
structural variations buffered by chaperones to environmental challenges P2 . 
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Figure captions 

1. The possible kinetics for amyloid nucleation. The N-terminal repeats are sketched 
by the thin curve connecting the peptide (shadowed) N-, C-terminal domains. The 
basic choice is whether monomers form aberrant sheets on their own and then either 
form fibrils directly (monomer spontaneous conversion) or act as templates for further 
monomer attachment (monomer-directed conversion) or alternatively whether the en- 
tire nucleus must form cooperatively (nucleated conformation conversion). Our work 
suggests a modified picture in which cooperatively formed nuclei may disassociate and 
act as monomer-directed templates - see text for discussion. 

2. (a) The phase diagram of a single-layer /?-sheet, exhibiting the transition from two- 
state (region B) to non-two-state (Ai, A2) behaviors as a function of sheet topology 
(M,L). In region Ai (A2), there is a dominant intermediate ensemble with a par- 
tially folded (3-sheet droplet (unfolded bubble) floating along the fibril (hairpin) axis. 
The inserts show representative specific heat diagrams, (b) The predominant ki- 
netic pathway. We computed the free energy change corresponding to a sequence of 
partially-folded states differing by the addition of one contiguous H-bond and thereby 
find the transition state and folding barrier for various paths [0, |15|. Analysis shows 



two predominant patterns: H-bonds are initiated 1) at several /?-turns, followed by 



symmetric propagation (symmetric zipping [H], Bi), or 2) at a collapsed loop and an 



existing turn, followed by asymmetrically propagation (asymmetric non-zipping, B2). 

3. (a) The free energy profile for the predominant kinetics at a given topology and 
peptide concentration [Cp]. AG is the free energy barrier height for /3-sheet formation 
within a single peptide and 6 AG = AG\[Cp] — ^G'lppj^o (inset, dashed curve; note 
its instability which leads to a sudden jump of 6AG (solid curve) and hence the 
onset of stacking), (b) The phase diagram including the stacking effect. A minimal 



oligopeptide repeat length A/^m is required for stacking (the thin dotted curve and 
inset bi). Here AZ=(M+l)(L+2) (assuming two Ca residues for one /5-turn). The 
kinetics is also modified by the stacking effect (symmetric zipping: Ci; asymmetric 
non-zipping: C2). (b2) The peptide concentration [Cp]stack at A/=90 required for the 
onset of stacking and barrier- less stacking; the order of [Cp]stack is compatible with 
experiment 0. (c) The Almin and (ci) the minimal peptide concentration [Cp]mm 
(for A/=120) required for barrier-less stacking as a function of agent concentration 
[Csql- Here i^™«'=N jg dissociation constant between the sequestering agent and a 
peptide coil that can form a totally N H-bond /?-sheet. 
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Appendix 

The Model Hamiltonian 

We consider a single isolated sheet consisting of M + 1 strands each of length L residues 
(i.e., number of total H-bonds = ML), with the coordinates of the j^^ residue (1 < j < L) 
on the i*'^-strand (1 < i < M) labeled as afjj. Since /3-sheet formation primarily involves 
a competition between the solvation of random coils and the formation of a collective H- 
bond network, we use an effective hydration parameter hi^ = Xj+ij — Xj.-,- to characterize 
the local H-bonding between adjacent strands. Moreover, as interested in the formation of 
best /3-sheet template for amyloidosis, we monitor the H-bonds between residues labeled 
by the same index in the two neighboring hairpin strands. Thus, we define a parameter 
Aj j = 1 if \hij\ ^ (i.e., no hydration) and otherwise, to measure the presence of H- 
bonds correctly contributing to the demanded /3-sheet structure. This leads to the following 



phenomenological model 
Hnb = ^ Aj 



(1) 



1 1 



Hcoii = ^'^^[^ + Ai,i-fc] n + Aij^a^i+k,j^i+k,j-i hij - hij^i (2) 

i,j k=0 k=±l a=0 

to describe the energy of correctly formed H-bonds and the entropy of the unfolded coils. 
Here, /i determines the basic energy of formation of a single H-bond and k the stiffness of an 
isolated strand. The other parameters are introduced to mimic the cooperativity inherent in 
H-bond network formation and, as described elsewhere, are fitted by simulations on a more 
microscopic model [1^. In particular, /2,7 describe the intra-hairpin coupling along the 
"hairpin" axis (parallel to the stacked /3-strands), and /3,7i are the inter-hairpin coupling 
along the "fibril" axis (the fibril direction, perpendicular to hairpin axis). 

Of course, when the system becomes larger, there will be mis-paired H-bonds generated 



between non-adjacent or mis-slid adjacent strands |[Tl]]. The lowest order correction to the 
"bare" partition function (based on Eqns.|l],^ is due to one single mis-paired H-bond formed 
between any two strands. This correction is estimated to be |]12[ 



^ (M + 1)M ^ 4(2-72) Ml±i)£,, (3) 

2! (1 + 7)^ V 2\ ^ ^ ^ 

with (3 = l/ksT, and Q = Ylij is the density of correctly formed H-bonds for the 

sheet. 
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Next, to include the stacking effect, we assume that this effect will modify the change of 
free energy due to the formation of a local, single H-bond, by an amount proportional to the 
density of ordered H-bonds on nearby sheets. This modification includes both a stacking 

energy Vg arising from non-specific van der Waals interactions and an entropy loss AS due 
to the reduction of the radius of gyration between partially folded adjacent /9-sheets. Details 
of how we estimated the stacking effect is given in supplement file. Overall, the Hamiltonian 
for the entire system follows 

H = XI {HHb(s) + Hcoilis)} + \^JstY^ \j,(t) [91 - 92^ij,{s)] 
s st ij 



J2^sing{s) + -^Jstiiint{s,t) (4) 



st 



with Jgf = 1 if sheets s, t are stacked together and otherwise, and Aj^ (5) is the Aj j 
parameters for sheet s. Here, we separate the entire Hamiltonian to a single-sheet part 
Hsj„g and an inter-sheet coupling part Hj„i. Also, for future convenience, we have defined 
notations gi = -^^In ^ and g2 = N~^[—Vs -\- gi], where Qq and Qat are the volume of 
completely unfolded coil and folded /3-sheet, respectively. 

The Partition Function 

Without Stacking Effect. Given the above expression, we transform the /5-sheet into a 
system with many coupled hairpins. Then, for a given choice of topology (M, L) and contact 
configuration {Ajj} (which then yields a particular H-bond density Q), we found that the 
partition function can be dissected in terms of three parts 

z[{A,,,-}] = ZHb^,„,ai + r2] (5) 

that allows us to compute contributions from the entropy of coiled part (in Zcou), the energy 
of partially formed H-bond (in Znb), and the H-bond mis-pairing effect (in r2). Again, the 
details of derivation can be found in supplement file. 

For the thermodynamics, we are interested in whether at the folding temperature Tf (at 
which the partition function Zq of a random coil, Q = 0, and the completely folded state, 
Q = 1, are equally weighted), there are other competing intermediates Zq_^^. In principle, 
for a two-dimensional /3-sheet structure, there are two possible competing intermediates 
with a partially folded droplet (unfolded bubble) floating along the fibril (hairpin) axis with 
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translational entropy large enough to compete with the completely un/folded states Zq, Z\. 
The phase boundaries separating the dominance of these ensemble are illustrated in Fig. 
with representative specific heat diagrams inserted. Our approach to the study of the 
kinetics of folding is described in the caption to Fig. ^ and elsewhere JH] , |T3| . 

With Stacking Effect. Using variational mean field theory to approach the thermody- 
namics, we obtained the following self-consistent equation 

where ^q[0] is the bare partition function without the stacking effect and /i is the peptide 
chemical potential. Apparently, /i is related to the peptide concentration [Cp] and the 
relation is estimated as 

To identify where the two-state-like behavior emerges, we examined if any of the aforemen- 
tioned intermediates dominates at the temperature Ty where 2'o[0] = Zi[0]. Apparently, 
if Qint is the H-bond density of the dominant ensemble, we would have (Q) = Qmt- This 
implies that the two-state behavior exists if VQmt, we have (0 < Qint < 1) 

ZQjQ^nt] = ZQ,„jO]e(^^+'^^«».)Q». < [ZolQUZilQ^nt]] (9) 

for a given topology (M, L) and temperature T/. Solving this requirement, we obtained the 
phase diagram Fig. |^a; likewise, the predominant kinetic patterns can be computed in the 
2-state region (see supplement for details). 

Estimating the Minimal Length for Stacking 

At the two-state region, the ensemble sum in (|) can be reduced as Zq [{Q)] ~ 
Zo[{Q)] + Zi[{Q)]; thus after re- arrangement, Eqn. (||J^) can be re-formulated into 

[Cppo = Y^^e-^--<«> (10) 

where f stack = '2[NpVs + \n{flo/QN)]. Then, from a previous study [^, we realize that 



Eqn. ([10|) can yield a "van-der-Waals" loop in the ([Cp], (Q)) diagram and hence a phase 
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separation (coexistence) effect. The phase separation referred to here is a separation between 
the dilute random coiled phase and the dense stacking phase. Numerical calculation showed 
that this requires a minimal length A/^m, as shown in Fig. |^ai. The peptide concentration 
that allows for phase separation can also be computed (see supplement for details); this 
yields Fig. 

Effect of Sequesterers. 

The effect of sequesterers on the /3-sheet partition function is estimated as the following 



(11) 



where [C^g] is the sequesterer concentration, [{A^}] is the /3-sheet configuration- 
dependent dissociation constant with the sequesterer, Q' is the averaged H-bond density 
from other sheets nearby the one interacting with sequesterers. Here the prefactor (1 — Q') 
indicates that binding of sequesterer is unlikely to occur if the target peptide is tightly sur- 
rounded by other sheets (as they tend to stack together into an anhydrous, dense aggregate). 

Mutually exclusive effects. One sequesterer might generally bind to multiple amide 
(or carbonyl) groups on a single targeted peptide; here for simplicity, we assume that the 
binding of each individual sequesterer functional site to the peptide amide (or carbonyl) 
group occurs in an independent manner. Moreover, we assume that the binding is mutually 
exclusive with local H-bond formation, as presumably how scavengers such as polyamine 
19| function. Then, for a pathological peptide that can form N H-bonds in total, assuming 



i^'™*' as the dissociation constant for a fully coiled peptide, we found that the peptide 
concentration for sheet-sheet stacking is modified (see supplement for detailed derivation) 



(Q) 



-e 



^{Q) f stack 



(12) 



Indeed, we found that the cooperativity in sheet-sheet stacking is enhanced by the pres- 
ence of sequesterers. This is because the binding of sequesterer and /3-sheet H-bond for- 
mation as well as stacking are mutually exclusive. Thus, the thermodynamic weighting of 
intermediates with partially formed H-bonds and weakly stacked /3-sheets are significantly 
suppressed; only completely folded and well-stacked sheets can escape from the attack of 
sequesterers. In the presence of sequesterer, we are interested in its dose-response curve. 
Specifically, we compute the minimal peptide concentration [Cpjmjn as well as minimal length 

15 



of repeats A/^m required for the downhill stacking. The numerical results are shown in Fi^ 
|b. 
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I. ESTIMATING THE STACKING EFFECT 



To proceed, we first notice that Vg is actually sequence- dependent. In general, energies 
of this type range from —10^^ to —0.2 kcal/mol As we are interested in the generic 
behavior independent of sequence, we take Vg = —0.1 kcal/mol as the size of the non-specific 
Ca-Ca coupling to mimic the stacking energy. Since this non-specific interaction between 
two H-bond units (from two different, nearby sheets) occurs only when both units have 
formed H-bonds, it contributes an energy between two stacked sheet s and t: B.y^{s,t) = 
"^ij ^ij,{s)\j,it)^s, where Ay^(5) (Ajj (j)) is for the (i, j)**^ H-bond at sheet s (t). The stacking 
energy for the entire system then follows Ey^ = ^ J2st ^stHvs(s,t) where Jst = 1 if s, t are 
stacked sheets and otherwise. Note that there are at most two sheet t's that can stack 
with sheet s. 

Second, we notice that the entropy reduction in stacked sheets arises merely from their 
structural confiicts. When the sheets are all in their completely folded state, there is no 
entropy reduction upon stacking since they are already in their minimal entropy state. 
Similarly, wandering of random coils is less prohibited when contiguous to a completely 
unfolded sheet, compared to a partially or completely folded one. For simplicity, we estimate 
the entropy of different conformations based on their volume. As an example, the volume 
of a fully unfolded coil, Qq, can be estimated from the random-flight chain model fig ~ 
[^/iRca]^ where / is the number of total residues at the mutant peptide N-terminal repeats, 
and the unit length for radius of gyration Rq^ ^ 11. 4A Likewise, the volume of 
a completely folded (3-sheet with a total of H-bonds (A^ = ML) can be estimated as 
Qjy = Nvub where fnb ~ 4.8 x 3.8 x 10. OA'^ is the unit volume of a single H-bond measured 
in a densely-packed /5-sheet aggregate 

Now, to estimate the entropy reduction and keep the formula simple, we use a global 
term instead of counting all local effects as done in the case of K- Speciflcally, for a given 
sheet s that is surrounded by two other sheets, indexed by t, we introduce the quantity 
Q' = Ajj_(()/(2A^) to account for the averaged H-bond density in these two sheets, 

as a global indication of how well they form (3-sheet structures. Moreover, for simplicity we 
assume that the entropy loss of sheet s can be roughly accounted for by a progressive volume 
reduction from Qq to Qn due to the structural conflict with its stacking neighbors, and for 
which we take the simple form Has(s) = {l — Q)Q'kBT\n[QQ/QN]. Here, the factor {1 — Q)Q' 



2 



indicate the structural conflict between sheet s and its contiguous neighbors. Of course, one 
can devise a more complicated formula to account for how the presence of H-bonds affects 
the entropy reduction, but this simple form will suffice for our present needs. 
Overall, the Hamiltonian for the entire system follows 

H = $^[HHb(^)+HU^)]+Ev, + ^ln 

s 

S st ij 

'^^ ^ Hsingis) + ^ '^stUintis, t) (l) 

s st 

with the notation defined in main text. 
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II. ESTIMATING THE PARTITION FUNCTION 
A. Without Stacking Effect 



Given the Hamiltonian 
'1 



H 



Hb 



2f2^i,j±i + 2/3^1^1^ + /i 



(2) 



7i y ] Atj-flAj-i-fcj At4-fcj_i I /ijj — /ijj-i ,(3) 

i,j k=0 k=±l a=0 

we can transform the /3-sheet into a system with many coupled hairpins. Then, for a given 
choice of topology (M, L) and contact configuration {Ajj} (which then yields a particular H- 
bond density Q), we further dissected each of these hairpins into one or several unfolded coils 
("bubbles") separated by successively folded segments ("droplets"). The partition function 
for these unfolded coils is estimated by previously developed methods ||, 0. Specifically, 
we define a functional Mj(ji, J2) as the Gaussian path integral (based on Eqn.^) for a coil 
running from residue index + 1) to {i,j2 — 1) with both (i,ji), (^,72) being H-bonded 
and a functional VFj(ji, J2) for a continuously H-bonded segment from index (i, ji) to («, ^2) 
(based on Eqn.||) 
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-/3HHb({Ai,,-}) 
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These Gaussian integrals can be worked out as shown in Ref. f^, 0, §]• The partition 
function can be broken up into successive multiphcation of these functionals 
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dh,,, [(1 - A,,,) + A,,,] e-/3PHb({A„,})+H.„,({A..,})] ^1 + r^(g)] 
n [W^.(0, Jl)M,(ji, j2)PVi(j2, Js) ■ ■ -ij [1 + r2(Q)] = ^Hb^co./ [1 + T2] (6) 

where (i, ji), ^2), • • • are the end points for the successive folded segments. Thus, in this 
way, we can dissect the free energy in terms of contributions from the Gaussian integral 
(in Zcoii), from the Ising-model-like energy component (in Znb), and from the H-bond mis- 
pairing effect (in r2). This allows us to fit the parameters with simulations on a microscopic 
model p|; once obtained, these parameters can be used to retrieve the entire partition 
function and hence the density of states using Multicanonical Monte Carlo sampling [0]. 



B. With Stacking Effect 

In the presence of stacking effect, the partition functional for the entire system reads 

where / Vh indicates integration for the entire hij{s) vector space. To compute the ther- 
modynamics, we use a Gaussian trick implemented in Ref. 0, |^ to decouple one of the 
inter-sheet terms. Briefly, we insert a Gaussian integral identify in Eqn.(|^) to separate the 
Ajj- (s)Ajj- (i) terms 

Then, using a transform Cij-(^) = (3g2Y^t ^stVijAt) ^^e identity J2tJst\j,{t) = 2NQ'{s), 
we have 



Next, we use variational mean field theory to approach the thermodynamics. First, we 
set Q' = (Q) in Eqn.(^ as the thermodynamically averaged H-bond density. Second, we 
rewrite the partition function into a form of grand canonical ensemble by inserting a chemical 
potential fi for each peptide. These steps allow us to decouple the partition function Eqn.(^ 
into a product of independent single peptide partition function where each peptide interacts 
with a background field {77}. Finally, the connection between {77} and (Q) can be found 
by a variational approach on {rj} similar to the procedure used in Ref. @, H. This yields 
a self-consistent relation ?7jj,(s) = (Aj^ (s)) where (Ajj^(s)) is the thermodynamic expectation 
value, which by definition is equivalent to (Q) in the sense of mean-field approach. Overall, 
one can work out the following self-consistent equation 

where Zq[0] is defined as in the main text. Note that the entropy effect is scaled by 
and hence is less important than the energy term. 

The loss of translational entropy of one peptide due to the presences of other peptides 
is absorbed into the chemical potential fi in this mean-field approach Q. To estimate 
the relation between /i and peptide concentration [Cp], we use a dilute gas of weakly folded 
peptides as the basic state which then "competes" with aggregate formation. In other words, 
the peptide concentration is that before the onset of aggregation. Specifically, we estimated 
the probability of finding a partially folded peptide in a unit volume that can be caught 
by other peptides. The size of this volume is estimated roughly to Vq ~ 47r[i?G((5)]^/3 
with Rg{Q) and Q denoting the radius of gyration and H-bond density of that peptide, 
respectively. Rg{Q) is further estimated by a linear interpolation Rg{Q) = (1 — Q)VlRca + 
Q^Hb]^'^^ between the volume estimates of fully un/folded /3-sheets. Then, [Cp] is connected 
to fi via the probability Pp of finding a peptide in volume VJg-g 



1 + 5^ Z<5|{Q>] 

Q<1 



J (Q>«0 
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(12) 



where Vq-^q ~ ^o- Thus, for any given M, L, and T, we can compute Zq[Q\ by Monte Carlo 
simulation, and then combine the given [Cp\ to solve {Q) self- consistently. 



III. MINIMAL LENGTH FOR STACKING 



At the two-state region, the ensemble sum in Eqn. ([To|) can be reduced as 'YIq \{Q)'\ ~ 
^o[(0)] + Zi[{Q)]] thus after re- arrangement, Eqn. ([TO| , pl]) can be re-formulated by defining 
an effective chemical potential /i' 



{Q) 



def 



l + ^o[0]e' 



(13) 



{z,[m+zi[m) 



def 



1 + e/^M'e^L"'"''"^'''^ 

\ _|_ e/3M' + /stac*:(Q> 

—f Stack {Q) — In 



(14) 
(15) 



where we have used the condition that the temperature is set at Zo[0] = Zi[0]. Then, from 
Eqn. (0,0) we have 



[Cp]Vc 



1 + eP^^ZoiQ] 



Zo[0]e' 



l + ^o[0]e' 



13 fi-~(Q) In, 



(16) 



After a straightforward calculation, one can show that the van-der-Waals in (/i', (Q)) 
diagram occurs only if fstack > 4. Numerically, one can show that f stack has a dependence on 
the /3-sheet topology as well as the mutant polymer length Al (the N-terminal oligopeptide 
repeats that construct the non-native (3-sheet). Thus, the criteria fstack > 4 yields a minimal 
length Almin requirement for the onset of stacking effect, as shown in Fig. 3ai. Also, 
from the abovementioned study 0, 0, we note that Eqn. (|15|) has one spinodal point at 
{Q)g_ = [1 — v^l — 4//stocfc]/2 where the barrier for the onset of stacking is zero (downhill 
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stacking), and one binodal point = [1 — ^]/2 with 2^ = tanh[/stacA:^/2] where the dilute 

and dense phases are equally weighted. The corresponding peptide concentrations [Cp\ for 
these two points can be computed from Eqns. ( p!3| , p!5| ,p!6|) 

yielding Fig. 3a2. 



IV. EFFECT OF SEQUESTERERS 

To model the effect of sequesterers, we note that the partition function for one protein A 
when interacting with another one B, A+B ^ AB, has a general form Za = Za[0] + ^^Za[1] 
where 2'^[0], ^a[1] are the partition function of protein A in the absence, presence of protein 
B, respectively, and Kd is the dissociation constant of the reaction. More generally, the 
second term can be expressed as the sum in which various KdS, each representing a different 
configuration of protein A, are included. Then, examining the effect of one sequesterer 
interacting with one non-native /3-sheet, we can see that it modifies the partition function 
eqn.(l) as follows: 

Zo[(Q>]-^,[«)][i + ^i^g^] (18) 

To express the form of [{Ajj}], we further assume that the binding of each individual 
sequesterer functional site to the peptide amide (or carbonyl) group occurs in an independent 
manner. This leads to a multiplicative expansion of the second term in eqn.(^) 

[C. 

Here, figq is the sequesterer chemical potential. The factor [1 — Ajj] indicates that a local 
interaction between sequesterer and the peptide occurs only if the local H-bond is not formed, 
which then allows a local binding or unbinding via a binding energy Vsq roughly estimated 
to be —2.8 kcal/mol comparable with the order of the H-bond energy (which is presumably 
how sequesterer binds to the amide or carbonyl group). Finally, the term accounts for 
the complete non-binding event and ensures at least one binding between the sequesterer 
and the peptide. In general, one can include more complicated binding pattern but here we 
restrict our modeling to this simplest case. 



|n[i-^^^-][i+^'''i-i} (19) 



Then, for a mutant peptide that can form N H-bonds in total, we can rewrite 
eqn.(p!^, p!5| , p!^ ) as 

1 



= -fstackiQ) + In 

[a 



T/'COll 



1 + 



T^coil 
d 



(1 - m 



iQ) 



-In 

{Q)f stack 



- 1 



(20) 
(21) 
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